Bayesian reweighting of biomolecular structural ensembles using heterogeneous cryo-EM maps with the cryoENsemble method

Cryogenic electron microscopy (cryo-EM) has emerged as a powerful method for the determination of structures of complex biological molecules. The accurate characterisation of the dynamics of such systems, however, remains a challenge. To address this problem, we introduce cryoENsemble, a method that applies Bayesian reweighting to conformational ensembles derived from molecular dynamics simulations to improve their agreement with cryo-EM data, thus enabling the extraction of dynamics information. We illustrate the use of cryoENsemble to determine the dynamics of the ribosome-bound state of the co-translational chaperone trigger factor (TF). We also show that cryoENsemble can assist with the interpretation of low-resolution, noisy or unaccounted regions of cryo-EM maps. Notably, we are able to link an unaccounted part of the cryo-EM map to the presence of another protein (methionine aminopeptidase, or MetAP), rather than to the dynamics of TF, and model its TF-bound state. Based on these results, we anticipate that cryoENsemble will find use for challenging heterogeneous cryo-EM maps for biomolecular systems encompassing dynamic components.

. Schematic illustration of the standard cryoENsemble method and its iterative extension.The input includes a structural ensemble (depicted in grey), typically obtained from molecular dynamics simulations, and a cryo-EM map of the biological system under investigation.Each model from the prior structural ensemble is fitted into the reference cryo-EM density before the cryoENsemble calculations.Subsequently, density is generated for each structure and starting weights (w i ) are assigned.In the standard mode, Bayesian reweighting generates new weights (w′ i ), indicated in the figure by varying shades of grey, for each structure in the ensemble (posterior structural ensemble) as well as posterior density for the system (see Methods).In the iterative mode, following the standard reweighting, a sub-ensemble of structures that meets weight criteria is selected for another round of Bayesian reweighting, which generates new weights (w″ i ) for the sub-ensemble.This process is repeated until the agreement with the experimental data decreases (reflected by an increase in χ 2 ).The iterative mode returns the minimal set of structures that maintains good agreement with the experimental cryo-EM data and posterior density.
In the context of cryo-EM data, the experimental data points are defined as a set of voxels with a density exceeding a predetermined threshold value, the latter established based on the noise levels present in the data (see Methods).To take into account the resolution anisotropy that can be present in the map, we can filter the map based on local resolution prior to reweighting.The likelihood function, P(D|w, I) assesses the probability of observing a given set of experimental data (D), considering the actual ensemble of structures and their corresponding statistical weights ( w ).The prior probability term, P(w|I) , encapsulates the knowledge about the structural ensemble and weights ( w ).This knowledge is typically derived from the molecular dynamics ensemble prior to the incorporation of the experimental data.The prior can be constructed in several ways, but keeping in line with the BioEn methodology, we utilise Kullback-Leibler (KL) divergence ( S KL ) where S KL is defined as S KL = M i=1 w i ln w i w 0 i and both reference ( w 0 i ) and refined weights ( w i ) are normalised and positive, and M is the size of the ensemble.Generally, the reference weights of the prior structural ensemble ( w 0 i ) are selected from the uniform distribution, though they can also be set according to populations derived from biased MD simulations (e.g. from metadynamics 62 ).An additional hyperparameter, θ , describes our confidence in the initial structural ensemble.A high value of θ indicates high confidence in the MD simulations and gener- ated ensemble, causing the refined weights ( w i ) to stay close to the initial ones ( w 0 i ).Conversely, a low value of θ, suggests that the initial ensemble may be far from optimal, allowing the weights ( w i ) to deviate significantly from the starting one ( w 0 i ).θ is automatically selected during optimisation based on the developed automatic L-curve analysis.
The likelihood function is modelled via a Gaussian distribution 63 where ρ 0 n represents the experimental/reference density from the n-th voxel of our map, whereas ρ i n (σ ) is simu- lated density from the same voxel generated from the i-th model of the structural ensemble with the use of Gaussian functions with the width equal to σ , which is a nuisance parameter (see Methods).This likelihood function contains two additional parameters: the variance of the Gaussian likelihood σ 2 L , which is equivalent to the experimental error, and the scaling factor ( α ).We approximate σ 2 L using the variance of noise distribution outside of the molecular system density, while α and σ are estimated during the reweighting.To determine the optimal value of θ , we perform calculations over a range of θ values and use an automatic L-curve analysis with the Kneedle algorithm 64 .This allows the selection of a θ value that yields good agreement with experimental data (low χ 2 ) and also prevents overfitting (maintaining a small difference from the distribution of the initial weights).
Having defined both the likelihood and prior functions, we can express the negative log-posterior function, which we will minimise to find the optimal weights, along with nuisance parameters, using the log-weights optimisation as encoded in BioEn The execution of standard cryoENsemble calculations yields optimal (non-zero) weights for every structure in our structural ensemble, along with the values of θ , σ and α .A schematic of our methodology is shown in Fig. 1A.Additionally, we extended the cryoENsemble standard run into the iterative mode, where we select a sub-ensemble of structures from the initial run and perform subsequent rounds of cryoEnsemble.In this mode, we select Neff*M structures with the highest weight during each round, where Neff is an effective sample size (Neff = exp(S KL )).This iterative process continues until the agreement with the experimental data decreases (reflected by an increase in χ 2 ) (Fig. 1B).
Both approaches have been tested using two extensive synthetic cryo-EM datasets from the adenylate kinase and ribosomal nascent chain complex (Supplementary Fig. 1) and showed that they can accurately reproduce the structural properties of the underlying conformational ensembles from the heterogeneous and noisy cryo-EM data.

CryoENsemble reweighting of a structural ensemble of adenylate kinase
Adenylate kinase (ADK) is an enzyme that catalyses the transfer of a phosphoryl group from ATP to AMP.This enzyme comprises three domains (CORE, NMP and LID) and undergoes a significant conformational change from open (apo) to closed (holo) conformation upon ligand binding, with RMSD = 7.16 Å (Supplementary Fig. 1A).Both states have been structurally characterised by X-ray crystallography (PDB IDs: 1AKE for the closed 65 and 4AKE for open 66 conformation).To test our method, we generated synthetic density maps based on these structures using different populations of each state, resolution and noise levels, culminating in a total of 66 maps for reweighting (Supplementary Fig. 2).The prior structural ensemble consisted of structures obtained from two structure-based all-atom MD simulations (see Methods).
For each ADK dataset, consisting of a structural ensemble and a selected set of voxels from a combination of reference map and simulated map (Supplementary Fig. 3), we ran the cryoENsemble reweighting method both in a standard and iterative approach.Initially, we assessed the effectiveness of our methodology in reproducing the reference population of the open state used to generate the reference maps (Fig. 2A).Our findings from the standard cryoENsemble run indicate consistency across all density maps, which decreases with both a reduction in resolution (10 Å) and an increase in the noise level (10%) (Fig. 2A).The consistency was lower for reference (2) P(w, I) ≈ exp(−θS KL ) www.nature.com/scientificreports/maps with a very small (≤ 0.2) or very large (≥ 0.8) population of the open state.This disparity is likely due to our prior structural ensemble equally representing the open and closed states, a significant deviation from these reference maps.This discrepancy was resolved in the iterative mode, where by selecting progressively smaller subsets (Supplementary Fig. 4), we managed to achieve the correct population across all analysed open states (Fig. 2A).Following these calculations, we generated a reweighted and averaged cryo-EM map for each dataset and compared it with the reference density map to evaluate the impact of the reweighting (Fig. 2B).For the comparison, we applied two metrics recommended by the 2019 Cryo-EM Model Challenge 67 : the correlation coefficient (CC) and Segment Based Manders' Overlap Coefficient (SMOC) 68 as they capture global and local aspects of similarity between maps.
The correlation coefficient calculated between the reference maps and posterior densities revealed that the effect of reweighting is evident across all open state populations in each ADK dataset (Fig. 2B).The correlation coefficient for medium and low-resolution maps reaches values of up to 0.9 or 0.8 for low (1%) and high (10%) noise levels, respectively.Higher resolution maps (3 Å), which contain more detail, present a more significant challenge in reweighting the structural ensemble to achieve high correlation coefficients, in particular when high noise levels (10%) are present in the data (Fig. 2B).However, reweighting consistently yields higher correlation coefficients than those obtained with the prior weights (for 3 Å with high noise levels (10%) on average CC = 0.597 and CC = 0.557 for posterior and prior weights, respectively).We also compared our reweighting results with the correlation coefficients derived from maps generated based on the best single structure fit.In the majority of the cases, the entire ensemble obtained after reweighting provides a more accurate representation of the map than any individual structure (Fig. 2B).Interestingly, when lowering the map resolution, e.g. from 6 to 10 Å, the quality of a single structure fit increases and in the cases of either entirely open or closed state it is higher than the CC of the prior ensemble, and can even equal the value of the reweighted ensemble.The single structure CC deteriorates when maps are close to an equal mixture of open and closed states.Although using iterative mode does not significantly improve the correlation coefficient (Fig. 2B), this method selects smaller sub-ensembles with at least as good agreement with the data as the whole reweighted ensemble (Supplementary Fig. 4).These observations show that cryoENsemble not only can provide a reweighted structural ensemble but also inform on when a single structure may be sufficient to describe a cryoEM map satisfactorily.
For the second score, we used SMOC, which captures the local similarity between the reference map and the fitted model.We calculated the average SMOC score across all residues and models from the MD ensemble and observed that it improved after reweighting, in particular for the entirely open and closed states (Supplementary Fig. 5).With iterative mode, we observed small further improvement, especially again for maps capturing completely open or close state.Altogether, for both global and local metrics, we see a clear effect of the reweighting on the quality of the ADK structural ensemble, with the iterative mode having the highest impact on the estimation of the correct populations.We subsequently analysed how the weights of each model from the structural ensemble were updated during the reweighting.We found that cryoENsemble shifted the weights from the uniform prior distribution to correctly capture the reference map open/closed state population in standard (Supplementary Figs.6-11) and iterative (Supplementary Figs.12-17) reweighting.Finally, a visual comparison of the prior and posterior average density maps alongside the reference maps shows the impact of the reweighting, especially pronounced for the open state maps where posterior maps combine only the open state conformations (Supplementary Figs.18-20).
Overall, we have demonstrated the effectiveness of cryoENsemble in characterising discrete heterogeneity in cryo-EM maps.We derived weights that can generate a map in good agreement with the experimental data (Supplementary Figs.18-20) and are able to describe the correct populations of each state (Fig. 2A).Our reweighted structural ensemble better explains the experimental data, using both global (Fig. 2B) and local metrics (Supplementary Fig. 5), than the starting ensemble.Furthermore, our method can also suggest when a single structure fitted into the reference map is insufficient, highlighting the necessity of using a structural ensemble for heterogeneous cryo-EM map fitting.

Reweighting the structural ensemble of the ribosome-bound nascent chain of FLN5-6
The second system we used for the test is the FLN5-6 ribosome nascent chain complex encompassing the immunoglobulin-like domain (FLN5 protein), captured during its biosynthesis on the bacterial 70S ribosome (Supplementary Fig. 1B).The FLN5 is the fifth filamin domain (residues 646-750) of the Dictyostelium discoideum gelation factor, and its co-translational folding has been extensively studied through a combination of experimental and computational techniques [69][70][71][72] .The FLN5-6 nascent chain sequence also consists of the 31 amino-acid linker comprising the fragment of the subsequent filamin domain (FLN6) and the SecM stalling sequence 73 .For our study, we obtained a starting structural ensemble of FLN5-6 based on the previous all-atom structure-based MD simulations 74 encompassing a diverse set of 100 structures of the FLN5-6 nascent chain (Supplementary Fig. 21A,C).Additionally, we created reference density maps based on random selections of 10 structures from this structural ensemble (Supplementary Fig. 21B) (see Methods section).
We applied the cryoENsemble protocol to the prior structural ensemble using the combined data from the reference map and maps generated from the structural ensemble.We used the entire structural ensemble (100 models) for the reweighting, including the ten conformations used to generate the reference density maps.Average density maps were generated based on the prior and posterior weights, and their correlation coefficients with the reference density maps were calculated (Fig. 3A).The prior ensemble, despite its significant structural heterogeneity, already displayed a good agreement with the reference density maps with average CC varying between 0.8 (3 Å maps) and 0.95 (10 Å maps) for 1% noise and from 0.47 (3 Å maps) to 0.84 (10 Å maps) for 10% noise (Fig. 3A).After standard reweighting, the correlation increased in all cases, reaching a value close to 1.0 for low noise levels (1%) and 0.9 for high noise levels (10%), with the only exception of the 3 Å maps, which, as in the ADK case, present a more significant challenge in reweighting the structural ensemble, in particular at the higher noise levels (10%) where CC reached 0.54 versus 0.47 with the prior weights.This difficulty is further apparent upon examining the extent of density in this highly noisy system (Supplementary Fig. 21B).Iterative reweighting improved the CC further, especially for the highest resolution maps.A comparison with maps generated based on a single structure shows that, in contrast to the ADK system, a single structure cannot represent the dynamic heterogeneity present in the nascent chain cryo-EM maps for any of the systems we tested (Fig. 3A).We also evaluated the reweighted ensemble using SMOC metrics (Supplementary Figs.22), finding that the reweighting improved the agreement with experimental data both globally and locally in all cases.Additionally, in order to assess the structural similarity between the obtained reweighted ensemble and the ten structures used to generate the reference map, we used the Jensen-Shannon (JS) divergence.We found significantly closer matching values upon reweighting, especially upon iterative reweighting (Supplementary Fig. 23).The high initial correlation between the prior ensemble and reference map can result in relatively minor changes to the correlation coefficients after reweighting (Fig. 3A).However, we observed significant shifts from the uniform distribution of the weights of the prior structural ensembles due to reweighting (Fig. 3B) especially visible after iterative reweighting, which significantly narrowed down the structural ensembles, especially for the high-resolution maps (Supplementary Figs. 24, 25A).The weights of the ten models used for reference map generation (circled in Fig. 3B, Supplementary Fig. 24) are substantially higher than those of the remaining structures (e.g.5.6% vs. 0.5% on average for 3 Å maps with 1% noise), a trend not significantly affected by the resolution of the density map or its noise level.These observations highlight the sensitivity of our method, which became particularly apparent when we analysed the entire dataset to determine how many of the ten models used to generate the reference map received the highest weight after the standard reweighting (Supplementary Fig. 25B).For high-and medium-resolution maps (3 and 6 Å), our method assigned the highest weights to the correct models in all datasets.While lower-resolution maps (10 Å) posed greater challenges, over half of the reference models were correctly identified to receive the top ten highest weights.
As the maps were generated based on a selection of an ensemble of 10 structures (target ensemble), we can use this synthetic data to test cryoENsemble's capacity to retrieve not only structural ensembles that correspond to the cryoEM map but also the embedded dynamics.To analyse if local flexibility can be retrieved upon reweighting, we calculated the root-mean-square fluctuations (RMSF) for prior and reweighted ensembles and compared them to the target RMSF using root-mean-square deviation (Fig. 4).We observed that, in all cases apart from the very challenging map with 3 Å resolution and 10% noise, the description of the RMSF improved, especially with the use of iterative mode.Furthermore, we performed a principal component analysis (PCA) to study global dynamics for each ensemble and compared the three main principle components (PC1, PC2, PC3) with the target values.We found significant improvement for PC1 and PC2 (Supplementary Figs. 26 and 27), whereas PC3 proved more challenging to improve for some maps (Supplementary Fig. 28).
Overall, we have shown that cryoENsemble can retrieve structural ensembles that globally and locally capture structural (CC, SMOC, JS divergence) and dynamical (RMSF, PCA) aspects ingrained in the cryo-EM map that represents continuous heterogeneity.www.nature.com/scientificreports/

Dynamics of the ribosome-bound TF in complex with PDF
Upon binding to the ribosome, TF remains highly dynamic, making it a challenging system for structural studies.We applied the iterative cryoENsemble methodology to the cryo-EM map, which represents the E. coli 70S ribosome in complex with PDF, TF and MetAP 50 .PDF and MetAP also bind around the ribosomal exit tunnel and compete for the s uL22--uL32 protein region 59 .MetAP additionally has a secondary binding site, which overlaps with the TF one 50,59 .When TF is bound to the ribosome in the presence of PDF or MetAP, it exhibits reduced dynamics and is, therefore, easier to characterise via cryo-EM.
We exploited this and ran a long all-atom structure-based 75 MD simulation with TF bound to the surface of the 70S ribosome complexed with PDF (Methods).Despite the fact that the dynamics of TF is restricted in the MD simulations by the presence of the bound PDF, it remains mobile, in particular within the peptidylprolyl cis-trans isomerase (PPI-ase) domain region (Supplementary Fig. 29).We next reweighed the MD trajectory using iterative cryoENsemble and the available cryo-EM map (EMDB: 30611 50 ) (Supplementary Fig. 1C).Our initial ensemble was already in good agreement with the cryo-EM data, with a correlation coefficient (CC) of 0.68 and after four rounds of iterative reweighting, the agreement improved to CC = 0.73, respectively (Supplementary Fig. 30).Moreover, as the reweighting process increased the weights of selected members of the ensemble (Fig. 5A, Supplementary Fig. 30), we identified the best-fit model within the density map, with CC = 0.65 (Fig. 5B).The improvement of the agreement with the experimental data for the MD ensemble, upon the reweighting, underscores the significance of utilising a structural ensemble instead of a single model when analysing heterogeneous cryo-EM maps.The most significant change in the ensemble composition was evident in our clustering analysis (see Methods).We found that, of the five main clusters encompassing 77% of the total trajectory, only one had an increased population following the reweighting (cluster_4 from 7.7 to 12%).Cluster_2 remained at a similar level (12%), and the remaining three experienced decreased populations, particularly apparent for cluster_1 (Fig. 5A), comprising 40% of the MD trajectory, which dropped to 21.5% in the reweighted ensemble.
Interestingly, cluster_14 had the highest absolute increase in the population upon reweighting (from 0.8 to 12%).The three main clusters obtained from reweighting combine to 45.5% of the population (Fig. 5C).Altogether, these findings show that reweighting using cryoENsemble can significantly improve the quality of the MD ensemble and its agreement with the cryo-EM data.Importantly, the reweighting process is not a simple increase of weights for all structures with high CC, as we found no correlation between new weights and corresponding CC scores (Supplementary Fig. 31); however, the structures that have the highest weights also tend to have high CC scores -supporting our observation that an ensemble explains heterogenous cryo-EM data better than a single structure.
Additionally, when we compared the experimental cryo-EM map with the map obtained from the single bestfit model and the map representing the entire reweighted MD ensemble, the reweighted map is more similar to the cryo-EM when visualised at different density thresholds (Supplementary Fig. 32).This emphasises not only the necessity to use an ensemble of structures instead of a single structure to capture information from highly heterogeneous cryo-EM maps but also underscores the importance of reweighting.
Finally, we analysed the TF dynamics retrieved from the cryo-EM map using RMSF and the first principal component (PC1) from the PCA analysis of the reweighted ensemble.We found that the most dynamic region corresponds to the PPI domain (Fig. 5D), which is also visible in the three main structural states (Fig. 5C).By conducting PCA on the reweighted ensemble, we characterised the movement of the PPI domain embeded in the cryo-EM map further and found that it moves towards and away from the ribosome surface (Fig. 5E).These observations align with previous work, which depicted, through coarse-grained MD simulations, that this domain fluctuates between bound and unbound state 57 .Altogether, these results show that cryoENsemble captures both the structural ensemble and its possible dynamics from the corresponding cryo-EM map.

The unaccounted cryo-EM density corresponds to a TF-bound methionine aminopeptidase, not TF dynamics
The initial study of TF, MetAP and PDF assembly on the ribosome provided several low-resolution cryo-EM maps of the 70S ribosome in various configurations 59 .Notably, in the cryo-EM map of MetAP-PDF-TF (12.2 Å, EMDB:9778), the MetAP density was unannotated, and an additional density near TF was attributed to the possible dynamics of TF.A subsequent study obtained a higher resolution cryo-EM map (4.1 Å) of the 70S ribosome with MetAP, PDF and TF with again additional density near TF, but now annotated as a tertiary binding site for MetAP 50 .Seeking to clarify the nature of this additional density, we took advantage of the unique combination of the MD simulations and cryoENsemble.After accounting for the TF structural ensemble obtained upon reweighting, we observed that there is still an unaccounted density present (Fig. 6A), which confirms the suggestion of a tertiary binding site for MetAP 50 .To further validate this observation, we fitted the MetAP structure using a rigid-body procedure, starting with the orientation where the positively charged loops faced the ribosome surface, as indicated by biochemical studies to be a probable ribosome-binding mode 76 , and found a compelling overlap (Fig. 6B, Supplementary Fig. 33).
Next, we ran another structure-based MD simulation using this TF + MetAP structure.The resulting structural ensemble was combined with the previous TF structural ensemble, and we conducted iterative reweighting with cryoENsemble.After five iterations (Supplementary Fig. 34), we obtained a reweighted ensemble with 79 structures and a correlation coefficient of 0.76, which is higher than when using only the TF ensemble.We identified two main states for the TF and TF + MetAP ensemble, finding that the population of the MetAP in the cryo-EM map is ~ 40% (Fig. 6C,D and Supplementary Fig. 35).
These findings demonstrate how MD simulations, in combination with cryoENsemble reweighting, can help explain unmodelled and unaccounted for parts of cryo-EM density maps corresponding to dynamic regions of biomolecular complexes with potential compositional heterogeneity.

Discussion
Characterising complex biological processes through cryo-EM presents many unique challenges, especially for systems that are dynamic, exist in multiple conformational or compositional states.Among such systems is the ribosome-bound trigger factor, which possible dynamics we have elucidated using cryoENsemble, a method that takes advantage of both the molecular dynamics simulation and Bayesian methodology to yield accurate structural and dynamic representations of complex biomolecular systems.Notably, unlike most of the existing methods, it adjusts the weights of the structural ensembles to improve their agreement with 3D cryo-EM maps, rather than fit or refine a single structure.Our approach uses Bayesian inference to reweight pre-calculated structural ensembles, which differs from EMMI 39 , where inference is applied during the MD simulations to refine the ensemble with data from the cryo-EM map.
The effectiveness of our method highly depends on the quality of the prior structural ensemble since our reweighting strategy by design does not produce new conformations.The advantage of this approach is that it is computationally less demanding, easier to set up and can use structural ensembles obtained from any type of molecular dynamics simulations (all-atom or coarse-grained force fields).Additionally, we can combine multiple different ensembles to study compositional heterogeneity.However, if the prior ensemble is too different from the cryo-EM map (e.g.low correlation coefficient upon reweighting), it may be necessary to restart the simulation in a different force field, from a different starting structure or use enhanced sampling methods to sample conformational space more efficiently.Finally, as new, more refined maps emerge, we can rerun the cryoENsemble without the need to restart the MD simulations, allowing us to use the method alongside ongoing cryo-EM map processing.
In this study, we used all-atom structure-based models 75 to sample available conformational space efficiently.While structure-based potentials have been previously used to fit models in the cryo-EM maps via MDFit 78 , we have instead employed them to generate a prior structural ensemble.Despite approximations, structure-based models allow exploration of the dynamics of large biological complexes that are otherwise inaccessible to more detailed computational approaches and can accurately describe their functional dynamics 79,80 .Coarse-grained simulations, once converted to the all-atom resolution, could be used in a similar manner to generate prior structural ensembles for cryoENsemble, thereby further expanding the accessible system size and complexity.
For more detailed systems, the use of more advanced force fields, such as CHARMM36m 81 or DES-AMBER 82 , may be necessary to generate more apt initial structural ensembles -potentially even guided by density-driven MD simulations 26 , where lower resolution map or one of the half-maps can be used to restrict sampled conformational space.Moreover, the structural ensembles derived from MD simulations with enhanced sampling methods can also be used to further increase the capacity to extensively sample the conformational landscape.In this scenario, weights obtained from the reweighted MD simulations would serve to define our initial ensemble.
CryoENsemble can also be used to improve or test molecular dynamics simulations.By setting up multiple simulations with different force fields, cryoENsemble can assess which of the force fields most accurately captures the cryo-EM data.Furthermore, this method could be integrated into various force field parameterisation schemes, thereby enabling the utilisation of cryo-EM data [83][84][85] .
In standard cryoENsemble reweighting, the weights can't be equal to 0 (see method description).By introducing the iterative mode, where in each step we select a sub-ensemble of structures (Fig. 1), we define the minimal set of structures that achieves a good agreement with the cryo-EM map.Each new sub-ensemble is treated as a new prior ensemble with uniform weights and undergoes a standard cryoENsemble run.The iterative mode improved the results in both of our test systems (ADK and FLN5-6 + 31 RNC).
While our method is computationally efficient, the reweighting time and memory usage can depend on the size of both the cryo-EM map (number of voxels) and the structural ensemble.This can be mitigated by clustering the MD structural ensemble before the reweighting to eliminate highly similar structures, as each requires calculating and storing a density map.The largest structural ensemble we tested comprised 2000 structures, which should suffice to capture the heterogeneity present in the cryo-EM reconstructions for most of the cases.We used a maximum of ~ 120,000 voxels from the cryo-EM map; however, one can apply initial down-sampling of the map to reduce the number of voxels for particularly large datasets.This method can also be helpful when working with large structural datasets.A stepwise reweighting with a downsampled map can be applied to obtain a minimal set of structures, which can be subsequently reweighted again using the high-resolution map.In a similar fashion, cryo-EM maps can be split into sections with varying resolutions and noise levels or be utilised through separate half-maps.Reweighting the structural ensemble first to the much more refined maps and then subsequently reweighting it with a less resolved map region could therefore help mitigate some of the challenges with highly heterogeneous maps.
Through the use of two synthetic datasets and one experimentally derived cryo-EM map, we have demonstrated that cryoENsemble can generate structural ensembles with averaged density maps closely mirroring the experimental maps and accurately reproducing the structural and dynamic properties of the underlying conformational ensembles.We have also presented that a fitted structural ensemble captures experimental data better than a single structure in these cases.
Each 3D cryo-EM map is obtained through extensive processing involving particle-sorting steps like 2D and 3D classification and thus represents only a fragment of the conformational space sampled by the system under study, which limits our complete understanding of its dynamics.Depending on the number of iterative 3D classifications and sorting rounds driven by the system dynamics and the scientific question, the maps can combine different levels of heterogeneity and dynamics.Developing methods such as cryoENsemble, which can capture dynamics in well-defined maps and model the various types of heterogeneity in low-resolution maps, will enable the user to choose maps in earlier processing stages, reducing the time from data collection to model.
The cryoENsemble approach is particularly suited for complex biological systems featuring convoluted dynamics.However, highly dynamic systems may be challenging if they do not present detectable cryo-EM density.Systems accessible to the cryoENsemble approach often yield cryo-EM maps with high-resolution regions associated with more static components and lower-resolution and ambiguous cryo-EM density describing dynamic elements.This includes nascent chain polypeptides or ribosome auxiliary factors bound to the ribosome.In these cases, the rigid and well-resolved structure of the ribosome contrasts with the low-resolution cryo-EM density of the NC or auxiliary factor.The dynamic character of these components implies the search for a solution in the form of a true structural ensemble rather than a selection of structures, which individually www.nature.com/scientificreports/fit into the density or just a single structure 43 .We have also shown how cryoENsemble can be applied to analyse unaccounted densities and compositional heterogeneity, which allowed us to model TF-bound methionine aminopeptidase.By enabling the analysis of cryo-EM maps for regions that are more dynamic and therefore have less well-defined density, our method opens up new avenues for structural studies.Finally, cryoENsemble can be extended to utilise other experimental data within the BioEn or similar framework, making it a potent tool for integrative structural biology.

Methods
We tested cryoENsemble using two synthetic datasets.The first one is adenylate kinase (ADK) in both the open and closed conformations, capturing the discrete heterogeneity present in cryo-EM maps.The second system is a ribosome-bound nascent chain of the immunoglobulin-like domain (FLN5-6 + 31), exemplifying continuous heterogeneity.Characterising ribosome-bound nascent chains using cryo-EM is especially challenging, given that they combine flexible and predominantly unstructured linkers in the exit tunnel with folded or partially folded domains outside of the exit tunnel 69 ; the latter are only transiently interacting with the ribosome 72 .

Generating the synthetic reference density maps
In our Bayesian framework, under typical circumstances, the reference density map would correspond to the experimentally derived cryo-EM map ρ ref = ρ exp .However, to test our methodology, we utilised synthetic ref- erence density maps.They were either generated based on the crystal structures of the open or closed ADK state ρ ref = ρ X−ray or on randomly selected models from the all-atom MD ensemble of the FLN5-6 + 31 ribosomal nascent chain ( , where ρ i is a density map of the i-th model).These synthetic reference density maps were generated using a protocol that mimics molmap command from ChimeraX with the bandwidth of the blur kernel σ set at 0.225 × resolution 86 .Maps were produced at three differing resolutions (3, 6, and 10 Å) to explore the influence of resolution on the reweighting process.To further investigate the effect of noise on reweighting, we added different levels of Gaussian noise to the map.The noise had a mean of 0 and a standard deviation based on either 1% or 10% of the map's maximum density.

Generating the density maps for the prior structural ensemble
In addition to the synthetic reference density maps (ρ ref ) , we generated density maps for every structure from the MD ensemble (ρ i ) .If not initially aligned, each structure was aligned to the reference density map using Situs 20 .Following this alignment and using an approach similar to one from the modified gmconvert script 40 , density maps were generated with the same voxel size, number of voxels and origin as the reference density map.The process involved positioning a spherical 3D-Gaussian function at each atom position with parameters for the corresponding atom obtained by fitting the electron atomic scattering factors specific to each atom type 40,87 .

Synthetic density map processing
The generated density maps, both the reference and those from the structural ensemble, were further processed using mrcfile python library 88 .From our reweighting dataset, we excluded voxels with negative values and rescaled the remaining ones to a molecular density value of 1, making the different maps easier to compare.Our reweighting methodology operates only on the selected voxels, both from the reference density map and the density maps generated based on the MD ensemble, that have density above the corresponding thresholds.The reference density map threshold is set up to be equal to 3 * σ noise , where σ noise was either 1% or 10% of the maximum density, whereas the threshold for maps generated based on MD was equal to 3*σ map , where σ map is the standard deviation of the synthetic map (Supplementary Fig. 3).

Generation of adenylate kinase synthetic cryo-EM densities
We generated synthetic density maps based on available X-ray structures (1AKE for closed and 4AKE for open conformation), and for the final reference map, we averaged the different populations of open and closed states maps, starting from fully open state conformation and changing the population progressively using 10% intervals until the fully closed conformation was arrived at.In our test protocol, we operated under the assumption that during the cryo-EM image processing, these states could not be separated into individual 3D reconstructions but were averaged into a single one.We produced 11 averaged reference maps at three different resolutions (3, 6 or 10 Å) and with varying levels of Gaussian noise (with a mean of zero and a standard deviation corresponding to 1% or 10% of the maximum ADK density) (Supplementary Fig. 2).In total, we generated 66 synthetic maps for analysis.

Generation of the prior structural ensemble for adenylate kinase
Two short (1.5*10 7 steps) molecular dynamics simulations, carried out in GROMACS 4.5.7 89 , were used to obtain the prior structural ensemble.We used an all-atom structure-based model generated in SMOG 2.0 75 with native contacts defined with the Shadow map algorithm 90  For our study, we generated a starting ensemble by randomly selecting 100 conformations of the NC from the FLN5-6 structural ensemble obtained from the previous all-atom structure-based MD simulation 74 (Supplementary Figs.1B and 21A).This ensemble exhibits significant structural heterogeneity, with RMSD values up to 28 Å (Supplementary Fig. 21C), reflecting the dynamic nature of the RNCs.To obtain the reference density maps, we randomly selected ten structures from this starting ensemble, generated an average density map at one of the three different resolutions (3, 6 or 10 Å), and repeated this procedure 100 times with Gaussian noise, corresponding to either 1% or 10% of the main density added (Supplementary Fig. 21B).This system enables us to evaluate our methodology in the case of continuous heterogeneity present in the cryo-EM maps.

Preparation of the trigger factor cryo-EM map for reweighting
For the final system, we used an experimentally derived cryo-EM map capturing the dynamics of the ribosomeassociated chaperone (trigger factor) bound to the ribosome in the presence of the peptide deformylase and excess of methionine aminopeptidase (Supplementary Fig. 1C) 50 .The obtained cryo-EM map had a clear density for the 70S ribosome, TF and PDF, which enabled authors to fit and refine models.However, the presence of the incomplete MetAP density suggested a novel tertiary binding site but did not allow for modelling the bound state.To create a reference map for the reweighting process, we used ChimeraX to select only the density that corresponds to either the Trigger Factor (TF) or the unmodeled MetAP (Supplementary Fig. 1C), subsequently saving it as a smaller, cropped map.This map was then normalised using the same methodology that we previously outlined for the synthetic reference map.We used the map threshold suggested by the authors (0.005) to select significant voxels for the reweighting.

Generation of the prior structural ensemble for the TF system
Using the available structure of the 70S ribosome from E.coli with bound TF and PDF (from PDB ID: 7D80 50 ), we prepared a starting structure for the MD simulation that encompassed the surface of the 70S ribosome around the ribosomal exit tunnel and bound both TF and PDF (Supplementary Fig. 36).We used an all-atom structurebased model generated with SMOG 2.4.4 75,91 with bond lengths and angles based on the AMBER03 force field 92,93 .
Native contacts that are used in structure-based potential were defined based on TF cryo-EM structure with the use of the Shadow Map 90 .For the structure-based MD simulations setup in SMOG, reduced units were applied with length, time, mass and energy scale all set to 1, except for the Boltzmann constant, which is k B = 0.00831451 (kJmol −1 K −1 , default in GROMACS).Simulations were performed for 5*10 8 steps in GROMACS 2021.2 94 in NVT ensemble at a reduced temperature of 0.5 (60 in GROMACS units), which is slightly below the temperature for this model to capture physiological conditions (0.582 reduced unit 92 ).The constant temperature was maintained via the Langevin Dynamics protocol.Taking advantage of a recent comparison of diffusion coefficients in the SMOG model and an all-atom explicit-solvent model 95 , we estimated the effective simulated time to be in a range of hundreds of microseconds.During simulations, we kept the atoms of the ribosome surface frozen.We sampled the trajectory every 5*10 5 steps generating 1000 structures, and clustered them based on the RMSD using the gmx cluster method from GROMACS (Fig. 5A).Obtained structural ensemble, we used as a prior during the reweighting process carried out in cryoENsemble.

Fitting of the MetAP and generation of the prior structural ensemble for the TF + MetAP system
To isolate the MetAP cryo-EM density, we utilised the ChimeraX 86 command 'volume subtract' to create a difference map between the original (EMDB: 30611 50 ) and the posterior map derived from cryoENsemble reweighting of the TF MD trajectory.The E.coli methionine aminopeptidase structure (PDB ID: 1MAT 77 ) was fitted into the obtained density using ChimeraX, orienting positively charged loops towards the ribosome, in accordance with previous studies 76 .For subsequent rigid-body fitting, we utilised the 'Fit in Map' command, setting the simulated map resolution to 8 Å. Generated TF + MetAP structure was used as a starting structure in the all-atom structure-based MD simulations with the same setup as the TF ensemble.The trajectory was sampled every 5*10 5 steps, generating 1000 structures, which were clustered based on the RMSD using the gmx cluster method from GROMACS.The obtained structural ensemble was combined with the TF ensemble and used as a prior during the reweighting process carried out in cryoENsemble.

Figure 2 .
Figure 2. CryoENsemble reweighting of ADK.(A) Open state populations calculated from cryoENsemble reweighting of the ADK dataset.The open state populations obtained after structural ensemble reweighting for each ADK dataset are shown in orange for the standard mode and green for the iterative mode, along with the target values (black circle).(B) Correlations between reference maps and posterior maps upon cryoENsemble reweighting of the ADK dataset in standard (orange) and iterative mode (green), including values for the prior ensemble and the best single structure.The datasets vary in resolution, noise level, and reference populations of the open state.

Figure 3 .
Figure 3. CryoENsemble reweighting of the FLN5-6 nascent chain dataset.(A) Correlations between the reference maps and posterior maps upon standard and iterative cryoENsemble reweighting of the FLN5-6 nascent chain dataset.Correlation coefficients calculated between the FLN5-6 nascent chain reference density maps and maps obtained before and after the reweighting, as well as the maps derived from the best single structure fitted into the reference density map.The 100 reference density maps (at resolutions of 3, 6, and 10 Å, and with noise levels of 1% and 10%) were generated based on ten randomly selected structures from the MD ensemble.(B) Weights obtained upon standard cryoENsemble reweighting of the FLN5-6 nascent chain dataset.Examples of the reweighting process for FLN5-6 nascent chain based on the reference map (at resolutions of 3, 6, and 10 Å, and with noise levels of 1% and 10%).Weights are calculated with different theta (θ) values ranging from 0 to 10 7 , and with black lines, we depict optimal weights selected based on the L-curve analysis.Additionally, weights corresponding to the ten models used to generate the reference map are circled.

Figure 4 .
Figure 4. CryoENsemble reweighting can retrieve local dynamics from cryo-EM maps.The root-mean-square deviations calculated between the RMSF of the target ensemble and the RMSFs from the prior ensemble (in grey), standard reweighting (in orange), and iterative reweighting (in green).

Figure 5 .
Figure 5. CryoENsemble iterative reweighting of the TF dataset.(A) Analysis of the effect of reweighting on the weights of each cluster obtained from the MD simulations.The green circle size is proportional to the population of the cluster upon reweighting.(B) The structural model with the highest weight selected by cryoENsemble (Supplementary Fig. 30) is visualised in two different orientations inside the cryo-EM map.(C) The three main states obtained upon reweighting corresponding to the cluster_1, cluster_2 and cluster_14, along with their populations.(D) RMSF calculated on the reweighted ensemble and mapped on the structure of TF. (E) Visualisation of the principal mode PC1 that captures the most dominant motion (indicated by blue arrows) within the reweighted ensemble.

of the prior structural ensemble and synthetic cryo-EM densities for the FLN5-6 + 31 RNCs
based on the X-ray structures of either the open or closed state.These two ensembles encapsulate the local dynamics around the native state of the apo or holo form.The combined structural ensemble consists of a total of 100 ADK conformations, with 50 randomly selected from each simulation.